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Abstract. Although information content is invariant up to an additive constant, the range of pos- 
sible additive constants applicable to programming languages is so large that in practice it plays a 
major role in the actual evaluation of K(s), the Kolmogorov-Chaitin complexity of a string s. Some 
attempts have been made to arrive at a framework stable enough for a concrete definition of K, in- 
. £^ dependent of any constant under a programming language, by appealing to the naturalness of the 

language in question. The aim of this paper is to present an approach to overcome the problem by 
looking at a set of models of computation converging in output probability distribution such that that 
naturalness can be inferred, thereby providing a framework for a stable definition of K under the set 
of convergent models of computation. 



Keywords: algorithmic information theory, program-size complexity. 

1. Introduction 

We will use the term model of computation to refer both to a Turing-complete programming language 
and to a specific device such a universal Turing machine. 



*Some of the ideas contained in this paper were developed during the stay of H. Zenil's tenure as a visiting scholar at Carnegie 
Mellon University. He wishes to thank Jeremy Avigad for his support and Kevin Kelly for his valuable comments and 
suggestions. 
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The term natural for a Turing machine or a programming language has been used within several 
contexts and with a wide range of meanings. Many of these meanings are related to the expressive se- 
mantic framework of a model of computation. Others refer to how well a model fits with an algorithm 
implementation. Previous attempts have been made to arrive at a model of computation stable enough 
to define the Kolmogorov-Chaitin complexity of a string independent of the choice of programming lan- 
guage. These attempts have used, for instance, lambda calculus and combinatory logic[9, 13] appealing 
to their naturalness. We provide further tools for determining whether approaches such as these are 
natural to produce the same relative Kolmogorov-Chaitin measures. Our approach is an attempt to make 
precise such appeals to the term natural related to the Kolmogorov-Chaitin complexity, and to provide a 
framework for a stable definition of K independent enough of additive constants. 

Definition The Kolmogorov-Chaitin complexity K u (s) of a string s with respect to a universal 
Turing machine U is defined as the binary length of the shortest program p that produces as output the 
string s. 

K u (s) = {min(\p\), U(p) = s} 

A major drawback of K is that it is unconfutable] 1 ] because of the undecidability of the halting 
problem. Hence the only way to approach K is by compressibility methods. A major criticism brought 
forward against K (for example inQ) is its high dependence of the choice of programming language. 

2. Dependability on additive constants 

The following theorem tells us that the definition of Kolmogorov-Chaitin complexity makes sense even 
when it is dependent upon the programming language: 

Theorem (invariance) If L\ and L2 are two Turing machines and Kl 1 (s) and Kl 2 (s) the Kol- 
mogorov - Chaitin complexity of a binary string s when L\ or L2 are used respectively, then there exists 
a constant Cl 1; l 2 such that for all binary string s: 

\K Ll (s) - K L2 (s)\ < C LlM 

In other terms, there is a program p\ for the universal machine L\ that allows L\ to simulate Li- This 
p\ is usually called an interpreter or compiler in L\ for L2. Let p2 be the shortest program producing 
some string s according to L2. Then the result of chaining together the programs p\ and P2 generates s 
in L\. Chaining p^ 2 onto p\ adds only constant length to P2, so there exists a constant C that bounds 
the difference in length of the shortest program in L\ from the length of the shortest program in L2 that 
generates the arbitrary string s. 

However, the constants involved can be arbitrarily large so that one can even affect the relative order 
relation of K under two different universal Turing machines such that if s\ and S2 are two different 
strings and K(s\) < K(s2) one can construct an alternative universal machine that not only changes the 
values for K (s\) and K(s2) but reverses the relation order to K(si) > K{s2)- 

One of the first conclusions drawn from algorithmic information theory is that at least one among 
the 2 n binary strings of length n will not be compressible at all. That is because there are only 2 n — 1 
binary programs shorter than 2™. In general, if one wants to come up with an ultimate compressor one 
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can compress the length of every string by c bits with 2 n_c length descriptions. It is a straightforward 
conclusion that no compressing language can arbitrarily compress all strings at once. The strings a 
language can compress depend on the language used, since any string (even a random-looking one) 
can in some way be encoded to shorten its description within the language in question even if a string 
compressible under other languages turns out to be incompressible under the new one. So one can 
always come up with another language capable of effectively compressing any given string. In other 
terms, the value of K(s) for a fixed s can be arbitrarily made up by constructing a suitable programming 
language for it. However, one would wish to avoid such artificial constructions by finding distinguished 
programming languages which are natural in some technical sense-rather than tailor-made to favor any 
particular string- while also preserving the relative values of K for all (or most) 2 n binary strings of 
length n within any programming language sharing the same order-preserving property. 

As suggested in [7], suppose that in a programming language L\, the shortest program p that gener- 
ates a random-looking string s is almost as long as s itself. One can specify a new programming language 
L2 whose universal machine U% is just like the universal machine U\ for L\ except that, when presented 
with a very short program P2 , Ui simulates U\ on the long program p, generating s. In other words, 
the complexity of p can be "buried" inside of U2 so that it does not show up in the U2 program p2 that 
generates s. This arbitrariness makes it hard to find a stable definition of Kolmogorov-Chaitin complex- 
ity unless a theory of natural programming languages is provided which is unlike the usual definition in 
terms of an arbitrary, Turing-complete programming language. 

For instance, one can conceive of a universal machine that produces certain strings very often or 
very seldom, despite being able to produce any conceivable string given its universality. Let's say that a 
universal Turing machine is tailor-made to produce much fewer (0) n strings than any other string in s G 
{0, l} n . By following the relation of Kolmogorov-Chaitin complexity to the universal distribution[iTTll8l 
m{s) = 1/2 K ( S ' + °( 1 > one would conclude that for the said tailor-made construction the string (0) n is 
of greater Kolmogorov-Chaitin complexity than any other, which may seem counterintuitive. This is the 
kind of artificial constructions one would prefer to avoid, particularly if there is a set of programming 
languages for which their output distributions converge, such that between two natural programming 
languages the additive constant remains small enough to make K invariant under the encoding from one 
language to the other, thus yielding stable values of K. 

The issue of dependence on additive constants often comes up when K is evaluated using a particular 
programming language or universal Turing machine. One will always find that the additive constant is 
large enough to produce very different values. This is even worst for short strings, shorter for instance 
compared to the program implementation size. One way to overcome the problem of the calculation of K 
for short strings was suggested in ||2j|3l. It involved building from scratch a prior empirical distribution 
of the frequency of the outputs according to a formalism of universal computation. In these experiments, 
some of the models of computation explored seemed to converge, up to a certain degree, leading to 
propose a natural definition of K for short strings. That was possible because the additive constant 
up to which the output probability distributions converge has a lesser impact on the calculation of K, 
particularly for those at the top of the classification (thus the most frequent and stable strings). This would 
make it possible to establish a stable definition and calculation of K for a set of models of computation 
identified as natural for which K{s) relative orders are preserved even for larger strings. 

Our attempt differs from previous attempts in that the programs generated by different models may 
produce the same relative K despite the programming language or the universal Turing machine being 
necessarily compact in terms of size. This is what one would expect for a stable definition of K to work 
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with even if there were still some additive constants involved. 



3. Towards a stable definition of K 

The experiment described in detail in [ 2 ] proceeded by analyzing the outputs of two different models of 
computation: deterministic Turing machines (TM) and one-dimensional cellular automata (CA). Some 
followed methods and techniques for enumerating, generating and performing exhaustive searches are 
suggested in further detail in lfl4l . The Turing machine (TM) model, represents the basic framework 
underlying many concepts in computer science, including the definition of Kolmogorov-Chaitin com- 
plexity, while cellular automaton, has been largely studied as a particular interesting model also capable 
of universal computation. The descriptions for both TM and CA followed standard formalisms com- 
monly used in the literature. The Turing machine description consisted of a list of rules (a finite program) 
capable of manipulating a linear list of cells, called the tape, using an access pointer called the head. The 
directions of the tape are designated right and left. The finite program can be in any one of a finite set of 
states Q numbered from 1 to n with 1 the state at which the machine starts its computation. There is a 
distinguished n + 1 state called the halting state at which the machine halts. Each tape cell can contain a 
or 1 (there is no special blank character). Time is discrete and the time instants (steps) are ordered from 
0, 1, . . . with the time at which the machine starts its computation. At any time, the head is positioned 
over a particular cell. At time the head is situated on a distinguished cell on the tape called the start 
cell, and the finite program starts in the state 1 . At time all cells contain the same symbol, either or 
1. A rule can be written in a 5-tuple notation as follows {sj, fcj, Sj+i, fej+i, d}, where Sj is the scanned 
symbol under the head, ki the state at time t, 8i+\ the symbol to write at time t + 1, ki+% and d the head 
movement either to the right or to the left at time t + 1. As usual a Turing machine can perform the 
following operations: 1. write an element from A = {0, 1}. 2. shift the head one cell left or right. 3. 
change the state of the finite program out of Q. And when the machine is running it executes the above 
operations at the rate of one operation per step. At the end of a computation the Turing machine has 
produced an output described by the contiguous cells in the tape over which the head went through. 

An analogous standard description of a one-dimensional cellular automata was followed. A one- 
dimensional cellular automaton is a collection of cells on a grid that evolves through a number of discrete 
time steps according to a set of rules based on the states of neighboring cells that are applied in parallel 
to each row over time. In a binary cellular automaton, each cell can take only one among two possible 
values (0 or 1). When the cellular automaton starts its computation, it applies the rules at row 0. A 
neighborhood of m cells means that the rule takes into consideration the value of the cell itself, m cells 
to the right and m cells to the left in order to determine the value of the next cell at row n + 1. 

For the Turing machines the experiments were performed over the set of 2-state 2-symbol Turing 
machines, henceforth denoted as TM(2, 2). There are 4096 different Turing machines according to the 
description given above and the derived formula (2sk) sk from the traditional 5-tuplet rule description of 
a Turing machine. It was then let all the machines run for t steps each and proceeded to feed each with 
an empty tape with and once again with an empty tape filled with 1 . 

It was proceeded in the same fashion for cellular automata with nearest-neighbor taking a single 1 on 
a background of 0s and a single start cell on a background of Is, henceforth denoted by CA(1). There 
are 2x2x2 = 2 3 = 8 possible binary states for the three cells neighboring a given cell, there are a total 
of 2 8 = 256 elementary cellular automata or EC A. 
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Figure 1 . The experiments can be summarized by looking at the above diagram comparing two output probability 
distributions for strings of length n — 3, after t — n x 10 = 30 steps. Matching strings are linked by a line. As one 
can observe, in spite of certain crossings, TM(2, 2) and CA(1) seem to be strongly correlated and both group the 
output strings by reversion and complementation symmetries. By taking the six groups-marked with brackets-both 
probability distributions make a perfect match. In [2] we provide another example for strings length n = 4. 



Let s(TM(i),rn) and a(CA(j),m) be the two sets of output strings produced by the 2-th Turing 
machine and the j-th cellular automaton respectively, after m steps according to an enumeration for 
Turing machines and cellular automata, a probability distribution was built as follows: the sample space 
associated with the experiment is S = {s\s £ {0, l} n } since both s(TM(i),m) and s(CA(j),m) 
are sets of binary strings. Let's call S the set of outputs either from s(TM(n),m) or s(CA(n),rn). 
For each s G S the space of the random variable X G S is {0, l} n . For a discrete variable X, the 
probability Pr{X = s) means the probability f{x) of the random variable X to produce the substring 
s. Let D(X) = {st, f(si)} such that for all Sj G S, f(st) > f(st+i). f(x) is the probability of s to be 
produced. In other words, D(X) is the set of tuples of a string followed by the probability of that that 
string to be produced by a Turing machine or a cellular automata after m = lOn steps. 



3.1. Output probability distribution D(X) 

D(X) is a discrete probability distribution since T> u Pr(X = s) = 1, as u runs through the set of all 
possible values of X, for a set of finite number of possible binary strings, and the sum of all of them is 
exactly 1. D(X) simply denoted as D from now on was calculated in [2] for two sets of Turing machines 
and cellular automata with small state and symbol values up to certain string length n. 

In each case D was found to be stable under several variations such as number of steps and sample 
sizes, allowing to define a stable distribution D for each, denoted from now on as Dtm for the distribu- 
tion of Turing machines and Dca for the distribution from cellular automata. 
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3.2. Equivalence of complexity classes 

The application of a widely used theorem in group theory may provide further stability, getting rid 
of crossings due to exchanged strings, with different strings probably having the same Kolmogorov- 
Chaitin complexity but biasing the rank comparisons. Desirably, one would have to group and weight 
the frequency of the strings with the same expected complexity in order to measure the rank correlation 
without any additional bias. Consider, for instance, two typical distributions D\ and D2 for which the 
calculated frequency have placed the strings (0) n and (l) ra at the top of D\ and D2 respectively. If the 
ranking distance of both distributions is then calculated, one might get a biased measurement due to the 
exchange of (0) n with (l) n despite the fact that both should have, in principle, the same Kolmogorov- 
Chaitin complexity. Therefore, we want to find out how to group these strings such that after comparison 
they do not affect the rank comparison. 

The Polya-Burnside enumeration theorem! 10] makes possible to count the number of discrete com- 
binatorial objects of a given type as a function of their symmetrical cases was used. We have found that 
experimentally symmetries that are supposed to preserve the Kolmogorov-Chaitin complexity of a string 
are reversion (re), complementation (co) and the compositions from them (cosy(s) and syco(s)). In all 
the distributions built from the experiments so far we have found that strings always tend to group them- 
selves in contiguous groups with their complemented and reversed versions. That is also a consequence 
of the setting up of the experiments since each Turing machine ran from an empty tape filled with zeros 
first and then again with an empty tape filled with ones in order to avoid any antisymmetry bias. Each 
cellular automata ran starting with a in a background of ones and once again with a 1 in a background 
of zeros as well for the same reason. 

Definition (complexity class) Let D be the probability distribution produced by a computa- 
tion. A complexity class C in D is the set of strings {s\,S2,- ■ ■ ,Sj} such that K (si) = Kfa) = . . . = 

K{s t ). 

The above clearly induces a partition since UILi Ci = D and f]" =1 Cj = for n the number of 
strings in D. In other words, all strings in D are in one and only one complexity class. We will denote 
D r the reduced distribution of D. Evidently the number of elements in D is greater than or equal to D r . 

The Polya-Burnside enumeration theorem will help us arrive at D r . There are 2 n different binary 
strings of length n and 4 possible transformations to take into consideration: 

1. id, the identity symmetry, id(s) = s. 

2. sy, the reversion symmetry given by: If s = d\d2, • • • d n , sy(s) = d n d2, ■ ■ . d\. 

3. co, the complementation symmetry given by co(s) = mod(di + 1, 2). 

Let T denote the set of all possible transformations under composition of the above. 

The classes of complexity can then be obtained by applying the Burnside theorem according to the 
following formula: 

(2 n + 2 n / 2 + 2 n / 2 )/4, for n odd 
( 2 n + 2 ("+i)/2)/ 4 otherwise. 
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This is obtained by calculating the number of invariant binary strings under T. For the transformation 
id there are 2 n invariant strings. For sy there are 2 n / 2 if n is even, 2^ n+1 ^ 2 if n is odd, the number of 
invariant strings under co is zero and the number of invariant strings under syco is 2 n < 2 if n is even, or 
zero if it is odd. Let's use B{D) to denote the application of the Burnside theorem to a distribution D. 
As a consequence of applying B(D), grouping and adding up the frequencies of the strings, once has 
to divide the frequency results by 2 or 4 (depending on the number of strings grouped for each class) 
according to the following formula: 

fr(s)/\ U (sy(s),co(s),syco(s))\ 

where f r represents the frequency of the string s and the denominator the cardinality of the union set of 
the equivalent strings under T. 

For example, the string s\ = 0000 for n = 4 is grouped with the string s 2 = 1111 because they 
both have the same algorithmic complexity: Coooo = {0000, 1111}. The index of each class Cj is the 
first string in the class according to arithmetical order. Thus the class {0000, 1111} is represented by 
Coooo- Another example of a class with two member strings is the one represented by 0011 from the class 
Coon = {0011, 1100}. By contrast, the string 0010 has other three strings of length 4 in the same class: 
Cooio = {0100, 0010, 1101, 1011}. Other class with four members is the one represented by 0001, the 
other three strings being Coooi = {0001, 0111, 1000, 1110} because for any s» G Coooi with i < n the 
number of strings in Coooi. T(si) = Sj, i.e. by applying a transformation T one can transform any string 
from any other in Coooi- 

It is clear that B induces a total order in D r from D under the transformations T preserving K 
because if s\, s 2 and S3 are strings in {0, l} n : K{s\) < K{s2) andiv'(s2) < K(si) thenK(si) = K(b) 
so si, S2 are in the same complexity class C SljS2 (antisymmetry); If K (si) < K(s2) and K (52) < K{c) 
then K(si) < K(s 3 ) (transitivity) and either K(s 1 ) < K(s 2 ) or K(s 2 ) < K(si) (totality). 

Hereafter the r in D r will simply be denoted by D, it being understood that it refers to D r after 
applying B{D). 

3.3. Rank order correlation 

To figure out the degree of correlation between the probability frequency 0, we followed a statistical 
method for rank comparisons. Spearman's rank correlation coefficient is a non-parametric measure of 
correlation, i.e. it makes no assumptions about the frequency distribution of the variables. Spearman's 
rank correlation coefficient is equivalent to the Pearson correlation on ranks. The Spearman coefficient 
has to do with measuring correspondence between two rankings for assessing the significance of this 
correspondence. The Spearman Rank Correlation Coefficient is: 

p = \- (6Sci J 2 )/n(n 2 - 1) 

where d{ is the difference between each rank of corresponding values of x and y, and n the number 
of pairs of values. 

The Spearman coefficient is in the interval [—1,1] where: 

• If the agreement between the two rankings is perfect (i.e., the two rankings are the same) the 
coefficient has value 1. 
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• If the disagreement between the two rankings is perfect (i.e., one ranking is the reverse of the 
other) the coefficient has value - 1 . 

• For all other arrangements the value lies between -1 and 1, and increasing values (for the same 
number of elements) imply increasing agreement between the rankings. 

• If the rankings are completely independent, the coefficient has value 0. 

3.3.1. Level of significance 

The approach to testing whether an observed value of p is significantly different from zero is to calculate 
the probability that it would be greater than or equal to the observed p, given the null hypothesis (that 
they are correlated by chance), by using a permutation test in order to conclude that the obtained value 
of p is unlikely to occur by chance. 

The level of significance is determined by a permutation test[6], checking all permutations of ranks 
in the sample and counting the fraction for which the p is more extreme than the p found from the data. 
As the number of permutations grows proportional to N\, this is not practical even for small numbers. 
An asymptotically equivalent permutation test can be created when there are too many possible orderings 
of the data. For less than 9 elements we proceeded by a permutation test. For more than 9 elements the 
significance was calculated by Monte Carlo sampling, which takes a small (relative to the total number 
of permutations) random sample of the possible orderings, in our case the sample size was 10000, big 
enough to guarantee the results given the number of elements. 

The significance convention is that below .5, the correlation might be the product of chance and then 
it has to be rejected. If it is 0.05, then there is enough confidence that the correlation has not occurred by 
chance and therefore it is said that the correlation is significant. If it is 0.01 or below, then the correlation 
is said to be highly significant and very unlikely to be the product of chance since it would occur by 
chance less than 1 time in a hundred. 

The significance tables generated and followed for the calculation of the significance of the Spearman 
correlation coefficients can be consulted in the following URL: 
http://www.mathrix.org/experimentalAIT/spearmantables 

3.4. Convergence in distributions 

We want to find out if the probability distributions built from single and different models of computation 
converge. 

Definition (convergence in order) A sequence of distributions Dx,D2,... converges to 
Dn, if for all string Sj G D n , ord(si) G D n — > ord(si) G Dn{s), when n tends to infinity. In 
other words, D n converges to an order when n tends to infinity. 

Definition (convergence in values) A sequence of distributions D\ , D2 ,.. . converges to 
Dn if, for all string Si G D n , f(si) G D n — > f(si) G Djy(s), when n tends to infinity. 

Definition (order-preserving) : A Turing machine N is Kolmogorov-Chaitin complexity 
monotone, or Kolmogorov-Chaitin complexity order-preserving if, given the output probability distri- 
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Figure 2. The above sequence of plots show the evolution of the probability distributions for both 2-state Turing 
machines and one-dimensional elementary cellular automata, arranging the strings (x axis) in arithmetical order 
to compare the frequency value (y axis) of equal output strings produced by each TM(2, 2) and CA(1). n is the 
length of the strings to compare with, but also determines how far a machine runs in number of t — 10 x n steps 
and how many machines are sampled determined by: a = n x 341 for TM(2, 2) and a = n x 21 for CA(1) 
with a the size of the sample so that 12 x 341 = 4092 and a = 12 x 21 = 252 give the closest whole numbers 
to the total number of machines in TM(2, 2) and CA(1) respectively, n is in other words what let us define the 
progression of the sequence to look for the convergence in distribution. Our claim is that when n tends to infinity 



the distributions converge either in order or in values to a limit distribution, as we will formulate in section 3.4 
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button £>i of AT, if K Dn { Si ) < K Dn {s 2 ) thenK D2 (si) < K D2 {s 2 ). 



Definition (quasi order preserving) A Turing machine N is c-Kolmogorov-Chaitin com- 
plexity monotone, or c-Kolmogorov-Chaitin complexity order- preserving if, for most strings, N is 
Kolmogorov-Chaitin complexity monotone, or Kolmogorov-Chaitin complexity order-preserving. A 
Turing machine N is .01-Kolmogorov-Chaitin complexity order-preserving is Kolmogorov-Chaitin com- 
plexity order-preserving. 



In order to determine the degree of order-preserving we have introduced the term c that will be 
determined by the correlation significance between two given output probability distributions D\ and 
D 2 . 

In other words, one can still define a monotony measure even if only a significant first segment of 
the distributions converge. This is important because by algorithmic probability we know that random- 
looking strings will be-and because of their random nature have to be-very unstable exchanging places 
at the bottom of the distributions. But we may nevertheless want to know whether a distribution con- 
verges for most of the strings. 



Whether or not a probability distribution D converges to D^, one might still want to check if two 
different models of computation converge between them: 



Definition (relative Kolmogorov-Chaitin monotony) Let be M and N two Turing ma- 
chine. M and N are relatively c-Kolmogorov-Chaitin complexity monotone if given their probability 
distributions D\ and D 2 respectively and Kr> 2 (s±) < KD 2 (s 2 ) then Kjj 1 (s\) < Kn 1 (s 2 ) in D\ for all 
f(si),f(s 2 ) >c. 



Definition (distribution length): Given a model M, the length of its output probability 
distribution D denoted by \D\ is the length of the largest string s G D. 



Main result TM(2, 2) and CA(1) are relative Kolmogorov-Chaitin complexity quasi monotone 
upto|D| = 12. 



The following table shows the Spearman rank correlation coefficients for -Dtm(2,2) with i^cM(i) 
from string lengths 2 to 12: 
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of chance) decreases very soon remaining very low, while the significance increases systematically from n = 2 to 

12. 
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Significance values are not expected to score well at the beginning due to the lack of elements to 
determine if other than the product of chance produced the order. For 2 elements in each rank order there 
are only 2 ways to arrange each rank, and even if they make a perfect match as they do, the significance 
cannot be higher than 50 percent because there is still half chance to have had produced that particular 
order. It is also the case for 3 elements, even when the ranks made a perfect match as well. But starting 
at 6 one can start looking to an actual significance value, and up to 12 in the sequence below one can 
witness a notorious increase up to stabilize the value at 0.01 which is, for all them, highly significant. 
Just one case was just significant rather than highly significant according to the threshold convention. 

The fact that each of the values of the sequence are either significant or highly significant makes 
the entire sequence convergence even more significant. Dtm (2,2) an d Dca(i) are therefore statistically 
highly correlated and they are relative 0.01 -Kolmogorov-Chaitin complexity quasi monotone up to \D\ = 
12 in almost all values. Therefore TM(2,2) and CA(1) are relative Kolmogorov-Chaitin complexity 
monotone. 
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It also turned out that the Pearson correlation coefficients were all highly significant between the 
actual probability values between D TM / 2 ,2) an d A?(i)> w i tn the following values: 



Number 


Pearson 


of elements 


coefficient 


2 


1 


3 


0.624662 


6 


0.979218 


9 


0.972992 


15 


0.95721 


14 


0.975683 


12 


0.920039 


12 


0.942916 


12 


0.982229 


11 


0.916871 


11 


0.944149 



The above results are important because they not only show that TM(2, 2) and CA(1) are Kolmogorov- 
Chaitin monotone up to \D\ = 12 but because they constitute the basis and evidence for the formulation 



of the conjectures in section 3.5 



3.5. Conjectures of convergence 

Let ord denote the ranking order of a distribution D and pr the actual probability values of D for each 
string s £ D, then: 

Conjecture 1 If pr(D TM (n)) = {f(s 1 ),f(s 2 ), ■■■, f(s u )}, then for all s it f Si ->■ f(L s .) when 
n — > oo with {/(L Sl ), f(L S2 ), . . . , f(L Sn ), . . .} the limit frequencies. In other words, the sequence of 
probability values /(Dtm (1)); /(-Dtm(2)), . . . , /(-Dtm(w)), • . . converges when n tends to infinity. 
Let's call this limit distribution pr hereafter. 

Conjecture 2 The sequence ord(DTAf(l)), ord(DTM(2)), ■ ■ ■ , ord(DTM{n)) converges when n 
tends to infinity. 

Notice that the conjecture 2 is weaker than the conjecture 1 since conjecture 2 could be true even if 
conjecture 1 is false. Both conjectures 1 and 2 imply there exists a k G {1, 2, . . . , n} such that for all 
i > k, TM(i) is Kolmogorov-Chaitin complexity order-preserving. 



Likewise for cellular automata: 
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Conjecture 3 The sequence pr(DcA(l)),pr{DcA(2), ■ ■ ■ ,pr(DcA(u)) converges to pr(DcA{n)) 
when n tends to infinity. 

Conjecture 4 The sequence ord(DcA{l)),ord(DcA{2)), ■ ■ • ,ord(DcA(n)) converges when n 
tends to infinity. 

Notice that the conjecture 2 is weaker than the conjecture 1 since conjecture 2 could be true even if 
conjecture 1 is false. Both conjectures 1 and 2 imply there exists a/c G {1,2,... , n} such that for all 
i > k, CA(i) is Kolmogorov-Chaitin complexity order-preserving. 

Likewise for Turing machines, conjecture 3 implies conjecture 4, but conjecture 4 could be true even 
if conjecture 3 is false. 

Conjecture 5 pr(DcA{ n )) = pr{Dx{n)). 

Conjecture 6 ord(DcA(n)) = ord{DT{n)). 

In other words, the limit distributions for both CA and TM converge to the same limit distributions. 

Conjecture 5 implies conjecture 6, but conjecture 6 could be true even if conjecture 5 is false. 

Both pr and ord define Dn, from now on the natural probability distribution. We now can propose our 
definition of a natural model of computation: 

Definition (naturalness in distribution) M is a natural model of computation if it is c- 
Kolmogorov-Chaitin monotone or c- Kolmogorov-Chaitin order-preserving for c = .01. 

In other words, any model of computation preserving the relative order of the natural distribution Dn 
is natural in terms of Kolmogorov-Chaitin complexity under our definition. So one can now technically 
say that a tailor-made Turing machine producing a different enough output distribution is not natural ac- 
cording to the prior D^. One can now also define (a) a degree of naturalness according to the ranking 
coefficient and number of order-preserving strings as suggested before and (b) a Kolmogorov-Chaitin 
order-preserving test such that one can be able to say whether a programming language or Turing ma- 
chine is natural by designing an experiment and running the test. For (a) it suffices to follow the ideas 
in this paper. For (b) one can follow the experiments described partially here supplemented with further 
details available in [3] in order to produce a probability distribution that could be compared to the nat- 
ural probability distribution to determine whether or not convergence occurs. The use of these natural 
distributions as prior probability distributions are one of the possible applications. The following URL 
provides the full tables: http://www.mathrix.org/experimentalAIT/naturaldistribution 
Further details, including the original programs, are available online in the experimental Algorithmic 
Information Theory web page: http://www.mathrix.org/experimentalAIT/ 

Further experiments are in the process of being performed, both for bigger classes of the same models 
of computation and for other models of computation, including some that clearly are not Kolmogorov- 
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TM and CA string output frequency distributions 
n = 12 



frequency 
(log) 






^W 




8 


• TM(2,2) 
■ CA(1) 


4 




2 







output 
10 string 



Figure 4. Frequency (log) distributions for TM(2, 2) and CA(1), for n : 
was made, unlike figure [2] The rate of grow seems to follow a power law. 



12. In this plot no string arrangement 



Chaitin order-preserving. More experiments will be performed covering different parameterizations, 
such as distributions for non-empty initial configurations, possible rates of convergence and radius of 
convergence, as well as the actual relation between the mathematical expected values of the theoreti- 
cal definitions of K(s) and m(s) (the so called universal distribution[9]), as first suggested in lEJO. 
We are aware of the possible expected differences between probability distributions produced by self- 
nondelimiting vs. self-delimiting programs[4], such as in the case discussed within this paper, where the 
halting state of the Turing machines was partially dismissed while the halting of the cellular automata 
was randomly chosen to produce the desired length of strings for comparison with the TM distributions. 
A further investigation suggests the possibility that there are interesting qualitative differences in the 
probability distributions they produce. These can be also be studied using this approach. 

If these conjectures are true, as suggested by our experiments, this procedure is a feasible and ef- 
fective approach to both m(s) and k(s). Moreover, as suggested in[2], it is a way to approach the 
Kolmogorov-Chaitin complexity of short strings. Furthermore, statistical approaches might in general 
be good approaches to the Kolmogorov-Chaitin complexity of strings of any length, as long as the sample 
is large enough for getting a reasonable significance. 
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